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ABSTRACT 


Credibility  theory  refers  to  the  use  of  linear  least-squares  theory  to  ap¬ 
proximate  the  Bayesian  forecast  of  the  mean  of  a  future  observation;  families 
are  known  where  the  credibility  formula  is  exact  Bayesian.  Second-moment 
forecasts  are  also  of  interest,  for  example,  in  assessing  the  precision  of 
the  mean  estimate.  For  some  of  these  same  families,  the  second-moment  fore¬ 
cast  is  exact  in  linear  and  quadratic  functions  of  the  sample  mean.  On  the 
other  hand,  for  the  normal  distribution  with  normal-ganma  prior  on  the  mean 
and  variance,  the  exact  forecast  of  the  variance  is  a  linear  function  of  the 
sample  variance  and  the  squared  deviation  of  the  sample  mean  from  the  prior 
mean.  BUhlmann  has  given  a  credibility  approximation  to  the  variance  in 
terms  of  the  sample  mean  and  sample  variance. 

4rr-this  paper,  -we  present  a  unified  approach  to  estimating  both  first  and 
second  moments  of  future  observations  using  linear  functions  of  the  sample 
mean  and  two  sample  second  moments;  the  resulting  least-squares  analysis 
requires  the  solution  of  a  3  *  3  linear  system,  using  11  prior  moments  from 
the  collective  and  giving  joint  predictions  of  all  moments  of  interest. 
Previously  developed  special  cases  follow  immediately.  For  many  analytic 
models  of  interest,  one  can  replace  the  3-dimensional  joint  prediction  with 
three  independent  credibility  forecasts  using  the  ''natural''  statistics  for 
each  moment. 


CREDIBILITY  APPROXIMATIONS  FOR  BAYESIAN 
PREDICTION  OF  SECOND  MOMENTS 

by 

William  S.  Jewell  and  Ren£  Schnieper 

0.  INTRODUCTION 

In  applications  of  Bayesian  prediction,  it  is  often  difficult  or  ex¬ 
travagant  to  compute  the  entire  predictive  distribution;  for  example,  the 
underlying  likelihood  and  prior  densities  may  be  empirical,  with  only  a  few 
moments  known  with  any  degree  of  reliability.  Also,  the  decision  structure 
may  depend  only  upon  the  first  few  moments,  instead  of  upon  the  total  shape 
of  the  predictive  density.  Finally,  the  need  for  repeated  recalculation  of 
forecasting  formulae  may  argue  for  simple,  easy-to-compute  results. 

A  case  in  point  is  actuarial  science,  where  the  fair  premium  (predic¬ 
tive  mean)  is  the  point  estimator  of  basic  importance.  To  this  may  be  added 
fluctuation  loadings ,  which  are  given  functions  of  the  predictive  second 
moment,  the  variance,  or  the  standard  deviation  (see,  e.g.,  Gerber  (1980)). 
Credibility  theory  is  the  name  given  by  actuaries  to  approximations  of 
Bayesian  predictors  by  formulae  that  are  linear  in  the  data,  chosen  to  min¬ 
imize  quadratic  Bayes  risk.  Thus,  credibility  formulae  are  linear  least- 
squares  predictors,  and  are  akin  to  the  classical  estimators  of  that  type, 
as  to  the  linear  filters  used  in  electrical  engineering. 

The  main  emphasis  of  credibility  theory  thus  far  has  been  on  approxi¬ 
mating  the  predictive  mean,  under  a  wide  variety  of  different  model  assump¬ 
tions  (see,  inter  alia,  Norberg  (1979),  Jewell  (1980)).  For  many  simple 
models  used  in  practice,  the  linear  credibility  predictor  of  the  mean  is 
exactly  the  Bayesian  conditional  mean;  in  other  situations,  the  credibility 
formula  is  usually  quite  robust. 


The  related  development  of  credibility  formulae  to  predict  second  mo¬ 
ment  is  much  less  developed,  in  part,  because  there  exist  no  correspondingly 
simple  exact  Bayesian  predictive  formulae,  and,  in  part,  because  the  least- 
squares  development  of  the  credibility  form  is  messy  and  tedious;  this  prob¬ 
lem  is  the  focus  of  this  paper. 

After  reviewing  the  theory  for  exact  and  approximate  forecasts  of  the 
mean,  we  shall  sunxoarize  the  known  exact  results  for  second-moment  predic¬ 
tions.  Then,  after  defining  the  various  moments  up  to  order  four  that  are 
needed  in  a  second-moment  prediction,  we  cast  our  one- dimensional  problem 
into  a  tnree-dimensional  credibility  formula  that  simultaneously  finds  point 
predictors  for  the  first  and  second  moments  of  interest  as  linear  combina¬ 
tions  of  three  "natural"  statistics  of  the  data.  By  analogy  with  tradi¬ 
tional  multidimensional  credibility  theory,  we  are  then  able  to  analyze  the 
asymptotic  behaviour  of  the  different  prediction  components,  and  to  argue 
for  the  robustness  of  simplified  versions  of  these  forecasts.  Finally,  we 
consider  various  special  cases  that  are  important  in  modelling  risk  problems. 
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1.  BASIC  MODEL  AMD  NOTATION 

Consider  the  usual  Bayesian  setup,  in  which  a  random  observable,  x  , 
depends  upon  an  unknown  parameter,  6  ,  through  a  (discrete  or  continuous) 
likelihood  density ,  p(x  |  6)  .  In  the  experiment  of  interest,  6  is  fixed 
at  some  unknown  and  unobservable  value  6  ,  but  the  parameter  has  a  known 
prior  density ,  p(6)  .  The  conditional  moments  of  x  ,  given  6  are: 


(1.1) 


m1(e)  -  Eui)1  I  e)  . 


(i  -  1,2,  ...) 


If  we  were  to  attempt  to  predict  x  prior  to  observing  any  data,  and  with¬ 
out  knowing  8  ,  we  would  have  to  use  the  marginal  density  of  x  ,  p(x)  * 
E{p(x  |  8)}  *  J  p(x  |  8)p(8)d8  ,  which  has  prior-to-data  (marginal)  moments : 


(1.2) 


mA  -  £{mi(8)}  »  E{(x)  }  . 


For  convenience  in  the  sequel,  we  also  define  higher  order  cross-moments 
about  the  origin,  such  as: 

(1.3)  m±j  -  E{mi(8)m^(e)}  ;  m±jk  -  E{mi(8)m^ (8)mk(8) }  ;  etc.  , 

explicitly  permitting  the  indices  to  be  repeated,  e.g.,  m^  *  Ej(m^(8))^ 
Thus,  from  the  four  conditional  moments  (m^(6)  ;  i  -  1,2, 3, A}  ,  we  can 
form  eleven  marginal  moments  of  order  four  or  less: 


(1.4)  U  . 


Three  central  moments  of  order  two  deserve  special  symbols: 


B 
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e  -  EV{x  |  6}  -  m2  -  m^  ;  d  "  VE{x  |  6}  -m^  -  m^  ; 

a. 5)  _  2 

c  "  Wx}  -  e  +  d  ■  m2  -  , 

where  double  operators  and  their  corresponding  operands  are  to  be  Inter¬ 
preted  "Inside-out."  Central  moments  of  higher  order  can  also  be  defined. 

Now  suppose  that  n  independent  observations,  V  *  {x^,x2 . xr}  , 

are  drawn  from  the  same  likelihood  density,  p(x  |  0)  ,  with  6  fixed, 
but  unknown.  From  Bayes*  law,  the  posterior-to-data  parameter  density  is: 

n 

(1.6)  p(e  |  V)  «  n  p(xu  j  e)p(e)  , 

u«l 

and  knowing  this  enables  us  to  calculate  the  posterior- to- data  predictive 
density  for  the  next  observation,  xn+1  »  as: 

(1.7)  p(xn+1  |  V)  -  J p(xn+1  |  e)p(e  |  V)de  . 

This  is,  in  fact,  the  predictive  density  for  any  future  observation,  assum¬ 
ing  that  6  does  not  change,  and  that  no  more  information  is  available. 
From  our  viewpoint,  given  V  ,  the  ^xn+i*xn+2’xn+3’  *  *  *  ^  are  exchangeable 
random  variables ;  for  example,  the  joint  predictive  density  of  (xn+i»xn+2^ 
is: 


(1.8)  p(xn+1*xn+2  I  P>  “  /p(x n+1  I  e)p(xn+2  I  6)p(6  I  P)d6  ' 


(1.7)  and  (1.8)  also  have  predictive  moments  analogous  to  (1.2),  (1.3): 

(1.9)  n^O?)  -  E^xn+1  |W  !  m2^  “  E{xn+1  “  f{xn+l*n+2  I  ’  etc*’ 


» 


that  can,  in  principle,  be  calculated  exactly;  however,  analytic  solutions 
almost  always  require  that  p(x  |  6)  and  p(6)  be  chosen  from  among 
natural  conjugate  families.  We  now  consider  how  approximate  results  can 
be  obtained  for  the  predictive  moments  in  (1.9). 
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2.  CREDIBLE  MEAN  FORMULAE 

Consider  first  the  problem  of  calculating  or  approximating  m^(P)  . 

For  many  years,  actuaries  (in  a  different  terminology)  have  been  assuming 
that  this  "experience-rated  premium"  was  linear  in  the  data,  as  summarized 
in  the  sample  mean,  x  *  J  xu/n  (it  is  clear  from  exchangeability  argu¬ 
ments  that  each  of  the  samples,  x^  ,  should  be  weighted  the  same).  Using 
heuristic  reasoning,  they  argued  for  the  approximation: 

(2.1)  m^(P)  «  Etx^  I  W  «  f*(P)  -  (1  -  +  2ix  » 

i.e.,  the  forecast,  f*(D)  ,  should  be  a  convex  combination  of  the  "manual" 
(prior)  mean,  m^  ,  and  the  "experienced"  mean,  x  .  The  "credibility 
factor,"  ,  that  weights  these  two  means  is,  they  argued: 


(2.2) 


Z1 


n 

n01  +  n 


* 


where  the  "credibility  time  constant,"  ng^  ,  was  to  be  chosen  empirically 
This  heuristic  formula,  used  for  many  years,  was  considerably  strengthened 
by  BUhlmann  (1967) ,  who  showed  that  the  best  linear  formula  (in  the  least- 
squares  sense)  to  approximate  the  predictive  mean  m^(P)  was  precisely 
the  credibility  formula,  f*(D)  ,  but  with  the  time  constant  computed 
explicitly  from  the  prior  second  moments: 


(2.3) 


Thus,  a  credibility  predictor  to  approximate  E{xn+^  I  needs  only 
the  first  three  components  of  (1.4),  {m^jn^.m^}  ,  instead  of  the  com¬ 

plete  shape  of  the  prior  and  likelihood  densities . 


» 


» 


» 


I 
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In  fact,  Bailey,  Mayerson,  and  others  had  already  shown  In  the  '50s 
that  (2.1),  (2.2),  (2.3)  was  exactly  m^(P)  for  many  "natural"  p(x  |  0) 
and  p(6)  used  in  Bayesian  modelling.  Jewell  (1974a)  then  showed  that,  if 
the  likelihood  were  a  member  of  the  simple  exponential  family  (for  which  x 
is  the  sufficient  statistic)  over  some  space  x  •" 


(2.4) 


P(X  1  6)  "  "^(f)  *  <*  e  X) 


and  p(0)  were  the  natural  conjugate  prior  to  (2.4): 


(2.5) 


-n01  -6x01 
I  e 


■fn--x-)  .  ce  e  ®) 

8  ^01*  01 ; 


over  the  maximal  range  ®  for  which  the  normalization  gfa^.x^)  is 
finite,  then,  under  a  certain  regularity  condition  (Jewell  (1975)),  (2.1) 
is  exact ,  with  the  hyperparameters  nQ1  in  (2.3)  and  (2.5)  identical,  and 
with  x01  -  min01  . 

A  simple  argument  also  shows  that,  if  the  exponent  0x  in  (2.4)  is 
replaced  by,  say,  0t(x)  ,  then  the  credibility  form  (2.1)  again  provides 
an  exact  prediction  for  E{t(xn+^)  |  V]  as  a  linear  combination  of  the 
prior  mean  of  the  statistic,  E{t(x)}  ,  and  the  sample  mean  of  the  statistic 
£  t(x^)/n  .  with  appropriate  redefinition  of  (2.3).  For  this,  and  other 
reasons  that  will  become  clearer  below,  we  feel  that  (2.1)  is  a  robust 


formula  in  most  cases. 


3.  EXACT  RESULTS  FOR  SECOND  MOMENTS 


We  now  consider  exact  results  that  are  known  for  the  predictive 
moments,  m^(P)  ,  n^P)  ,  and  m^(P)  ,  concentrating  on  the  most-studied 
case,  the  simple  exponential  family. 

It  is  well  known  that  the  combination  (2.4),  (2.5)  is  closed  under 
sampling,  so  that,  posterior-to-data  P  ,  the  hyperparameters  in  (2.5)  are 
replaced  by: 


(3.1) 


01 


01 


+  n 


01 


01 


+  nx 


Since  m^  *  xQl^n01  *  ^°H°WS  that  the  updated  first  moment  is: 


(3.2) 


E{in+1  1  V}  m  ml(V) 


XQ1 

noi 


+  nx 
+  n 


(1  -  Zj)^  +  z^x  , 


which  is  simply  (2.1),  (2.2).  It  is  also  clear  that  the  marginal  second 
moments  must  also  involve  only  n^  and  x^  ,  and  that  the  predictive 
second  moments  must  be  a  function  of  only  the  sufficient  statistic,  x  , 
but  no  further  statement  can  be  made  about  dependencies  in  general. 

Jewell  (1974a)  tabulates  d  *  dfag^.x^)  for  six  of  the  examples  given 
below,  whence  one  can  easily  get  e  -  n^d (nQ1  ,xQ1)  ,  c  -  (nQ  +  l)d(n01,x01 
and  hence: 


(3.3) 


m2(V)  -  (nQ1  +  1  +  n)d(nQ1  +  n,xQ1  +  nx)  +  m*(P) 


0^(0)  -  d(nQ1  +  n,x01  +  +  * 


and,  from  these,  the  updated  versions  of  the  central  moments  c  and  d  : 


(3.  A) 


V{in+1  1  V}  "  n2{V)  '  nl(P)  * 

C{xn+l;xn+2  1  V}  "  “ll(P)  -  ml(V)  * 


Example  1: 

Let  p(x  |  0)  be  Bernoulli(n)  and  p(n)  be  Betafag^.n^  -  x^)  , 
(6  ■  ln(ir  ^  -  1))  ,  then: 


(3.5) 


d(n01,x01) 


X01(n01  ~ 

"oi<”oi  +  11 


Example  2: 

Let  p(x  |  0)  be  Geometric(n)  ,  and  p(n)  be  Beta(Xg^,ng^  +  1)  , 
(0  ■  Inn  *)  ,  then: 


(3.6) 


d(°oi ,xoi) 


x01(x01  +  n01) 
2 

n01^n01  “  ^ 


Example  3: 

Let  p(x  j  0)  be  Poisson(n)  and  p(n)  be  Gamma  (Xg^.n^)  , 
(0  *  Inn  )  ,  then: 


(3.7) 


■“"oi’V 


V01 

2 

*01 


Example  A : 


Let  p(x  |  0)  be  Exponential (0)  ,  and  p(0)  be  Gamma (nm  +  l,xm) 
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(3.8) 


d(n01,X01)  "  2 


01 


°01(n01  "  1) 


Example  5: 

2  2 

Let  p(x  |  0)  be  Normal  (it,  8q)  ,  known,  and  p(Tr) 
2 

8  •  -ir/s^  » 


Normal 


Poi 
\n01  • 


(3.9)  d(nA1 ,x  )  - -  (independent  of  xrt. ) 

Ui.  Ul  Dq^  Oi. 


Thus,  in  these  examples  from  Jewell  (1974a),  dO^.x^)  *  ®^(P)  ,  n^O?)  , 

and  m^(P)  are  all  linear,  quadratic,  or  constant  in  x^^  and  hence  in 
x  as  well. 

Morris  (1982)  refers  to  simple  exponential  likelihoods  (2.4)  in  which 
n^e)  is  at  most  a  quadratic  polynomial  in  21^(6)  as  QVF-NEF ;  he  shows 
that  the  only  members  of  this  family  are  the  five  examples  above,  plus 
Example  6,  below,  plus  all  of  the  related  members  found  through  linear 
translation  and  convolution  (Binomial,  Pascal,  Gamma,  etc.). 

Example  6: 

The  last  member  of  this  group  is  the  Hyperbolic  Secant  density: 


-ex 


p(x  |  6)  -  (COB  8)  J  co.h(,;/2)  >  ®, 


-00, +00] 

■.i  + 

2  *  + 


i] 


for  which 


(3.10) 


d(n01,x01) 


2,2 

X01  *  n01 

noi(noi  ” 
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This  likelihood  seems  to  be  useful  only  in  certain  random-walk  problems. 

We  should  mention  also  that  it  is  easy  to  construct  members  of  the 
simple  exponential  family  in  which  the  mean  is  a  complicated  function  of 
x  ,  for  example,  by  truncating  the  range  of  any  of  the  above  distributions. 

To  obtain  dependency  on  x  and  other  statistics,  we  must  turn  to 
two-parameter  families,  of  which  the  most  popular  is  the  normal  density 
with  both  the  mean,  u  ,  and  the  precision,  o>  ,  as  random  quantities. 


Example  7; 
Let 


p(x  |  6)  ■  p(x  |  p,w)  *  Normal(y,u>  *) 


p(w)  “  Gamma 


(•• » ft  -  ft)])  ■ 


and 


p(p  |  w)  *  Normal 


ft  • 


with  a  ,  Xq^  ,  Xqj  »  and  n^^  given  hyperparameters.  This  family  is 
closed  under  sampling,  with  updating: 


(3.11) 


a  *  a  +  2  ;  n01  "  n01  +  n  ; 


X01  *  X01  +  l  Xu  ;  x02  -  x02  +  l  \  ; 


from  which  we  find  that  (3.2)  again  holds,  and  that: 


(3.12)  d  -  d(n_, ,a,x_, ,x„) 


01  01  02  (2a  -  2) 


/X02  \  _  /x0l\ 
\n0l/  \n0l/ 


(2a  -  2) 


where  c  *  e  +  d  is  the  prior  variance. 


For  this  example,  ve  Bee  that  the  updating  will  give  exact  second-moment 
predictors  that  are  quadratic  in  x  and  linear  in  x*  ■  £  x^/n  •  Be- 
cause  the  normal  case  is  so  important  to  least-squares  approximations, 
we  also  give  the  exact  results  corresponding  to  (3.4)  in  terms  of  the 
sample  variance,  s  *  n  £  (x^  -  x)  ,  the  sample  mean,  x  ,  the  prior 
marginal  variance,  c  ,  and  the  credibility  factor,  z^  : 

V{Vl  1  V)  m  (n01  +  1  +  n)C{Vl:V2  I  V} 

(3.13) 

An  important  simplification  occurs  if  the  "natural"  choice  2a  * 
is  made;  note  that  this  does  not  significantly  restrict  the  choice  of  the 
2-parameter  Gamma,  but  does  mean  that  there  are  only  three  distinct  hyper 
parameters  in  all.  (3.13)  then  simplifies  to  a  generalized  credibility 
formula: 


^Vl  1  V}  m  (n01  +  1  +  n)C{*n+l;*n+2  I  V} 

(3.14) 

*  (1-  zx)c  +  z^s2  +  Zj^l-  z1)(m1~  x)2  . 

This  result  is  not  new,  but  rearrangement  into  credibility  form  first 
appeared  in  Jewell  (1974a).  The  equivalent  multidimensional  formula 
appears  in  Jewell  (1974b)  and  Jewell  (1983) . 

(3.14)  is,  in  fact,  equivalent  to: 

(3.15)  E {xn+l  1  *  ®2(P)  "  (l-Zj)^  +  z^x2  , 


that  is,  the  predictive  second  moment  is  exactly  in  credibility  form  with 
the  same  credibility  factor  as  in  (2.2),  with  obvious  adjustments  to  the 
prior  mean  and  sample  mean. 
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4.  LEAST-SQUARES  THEORY  AND  MULTIDIMENSIONAL  CREDIBILITY 

We  now  take  a  temporary  detour  to  display  some  general  results  from 
multidimensional  credibility  theory  that  will  be  used  in  the  next  section. 
Suppose  we  have  a  vector-valued  version  of  the  Bayesian  model  of  Section  1, 
in  which  samples  V  ■  •**»  8  vector-va-*-ue^  random  variable, 

2  ,  are  to  be  used  to  predict  a  random  vector,  w  .  If  we  approximate 

E{w  j  V)  by  a  linear  function  of  the  vector-valued  sample  mean,  2  “  I  * 

least-squares  theory  then  shows  that  the  best  (vector-valued)  predictor  is: 

(4.1)  f(D)  “  (E{w)  -  ZE{y})  +  Zy  ~  E{w  |  V}  , 


where  Z  is  a  matrix  of  appropriate  dimensions  given  by  the  solution  of  the 
normal  system  of  equations: 


(4.2) 


(C  is  the  matrix  covariance  operator). 

Now  suppose  w  is  actually  a  future  observation  of  the  same  random 


vector,  > 

say 

2  "  2, 

(4.1') 

f(P) 

where  I  is 

the 

n  x  n 

(4.2') 

f  (P)  -  (I  -  Z)m  +  Z;£  a  E{^n+1  |  V ) 

nit  matrix,  Z  is  the  square  solution  of: 

4 +  i  s)  ■  2  - 

m  is  the  prior  mean  vector,  obtained  from  the  conditional  mean  vector: 


» 


» 


» 
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a(e)  -  E{^  |  6}  ; 

m  -  E{j)  “  E{m(6) }  ; 


(4.3) 


and  £  and  D  are  the  two  components  of  withln-risk  and  between-rlsk 
covariance,  respectively: 


(4.4) 


E  -  EC{^;j  1  6}  ;  D  »  C{m(6);m(6)}  ; 
C  -  ■  E  +  D  . 


Thus,  the  credibility  formula  of  Section  2  extends  directly  to  the 
multidimensional  case,  with  a  credibility  matrix ,  Z  ,  mixing  the  prior 
mean,  m  ,  and  the  experience  mean,  £  .  The  analogy  is  complete  if  we 
assume  D  has  an  inverse  and  rearrange  (4.2'): 


(4.5)  Z  «  nD(E  +  nD)"1  -  n(nl  +  N)"1  ;  N  -  ED-1  , 

where  N  is  now  a  matrix  of  time  constants.  Further  details  on  this  ex¬ 
tension  may  be  found  in  Jewell  (1974b). 

The  accuracy  of  any  forecast  f(P)  for  is  measured  by  the 

diagonal  terms  of  the  expected  squared-error  matrix: 

(4.6)  *  -  E{ l2n+1  -  f(P)][zn+1  -  f(P)]'>  ; 

note  that  the  expectation  is  over  all  possible  joint  values  of  (£n+^;P)  • 
However,  since  the  latter  are  independent,  given  6  ,  $  can  be  decomposed 
into: 

*  -  E^2n+1  -  ?(6)Hzn+1  '  1 '  J  +  E([f(P)  -  m(6)  ]  [f  (V)  -  m(e)]’} 

(4.6’) 

*  E  +  f  ,  say, 


where  we  see  the  portion  of  the  mse  due  to  the  inherent  fluctuation  of  the 
observable,  and  the  nse  due  to  the  approximation  of  the  true  mean,  m(6)  , 
by  the  approximation,  f(P)  . 

We  know  that  the  minimum  values  of  the  diagonal  terms  for  4  and  V 
are  attained  by  picking  the  Bayesian  predictive  mean,  m(V)  ■  ^Xn+i  I  ^  » 
which,  in  general,  leads  to  a  nonlinear  regression  on  the  data.  With  a 
linear  forecast  (4.1'),  (4.2'),  it  is  easy  to  show  that,  for  any  n  , 

l  m  E[E  +  nD]-1D  -  [I  -  Z]D  *  D[I  -  Z' ]  . 

In  most  cases  of  interest,  all  terms  of  ¥  will  approach  zero  as  n  ap¬ 
proaches  infinity  for  any  forecast,  so  that  all  forecasts  are  asymptotically 
equivalent;  in  the  linear  case,  it  usually  happens  because  Z  approaches  I 

(see  also  (8.6)).  Fortunately,  a  linear  predictor  also  usually  has  small 

/ 

mean-squared-error  also  for  moderate  n  ,  even  though  f(P)  is  not  exactly 
the  Bayesian  predictive  mean. 

We  now  examine  the  use  of  (4.1'),  (4.2')  as  an  approximation  for  our 
original  one-dimensional  problem  of  estimating  second  moments. 
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5.  ORGANIZING  THE  LEAST-SQUARES  COMPUTATIONS 

We  return  to  the  mein  problem  of  organizing  credibility  approximations 
f 2 <^)  •ad  f^(P)  for  ®2^)  m^(P)  arbltrary  distributions. 

In  view  of  the  exact  results  In  Section  3,  it  seems  reasonable  to  restrict 
the  statistics  to  be  used  to  linear  and  quadratic  functions  of  the  data; 
however,  there  are  several  different  ways  to  select  statistics  of  this  type. 
After  a  great  deal  of  experimentation,  the  authors  have  found  that  the 
choices  that  give  the  simplest  and  clearest  results  are  the  "natural"  first 
and  second  moments  about  the  origin: 


(5.1)  t1(P) 

i 


*  ; 


1 2m 


"ii 


(p)  = 


n(n  -  1) 


I  l 

u^v 


X  X 

u  v 


for  n  ^  2  .  In  other  words,  we  set  £  =  [t^(P)  ;t^(P)  ;  (P)  ] '  *  t(P)  in 

(4.1).  Note  that  this  choice  implicitly  includes  (x)2  »  [  t2  (P)  +  (n- Dt-^P)  ]/n  , 

2 

as  well  as  the  sample  variance  s  *  [  (n  -  1)  /n]  [t2(P)  -  t^P)]  • 

As  predictands,  we  can  get  all  the  forecasts  of  interest  simultaneously 
by  setting  w  *  £xn+1 ; x2+1 ; x^+Yxn+2J '  •  Then,  to  get  Z  in  (4.1),  we  need 
only  to  compute  the  means  in  (4.1)  and  the  two  covariance  matrices  in  (4.2). 

This  approach  is  thus  similar  to  credibility  regression  modelling.  (Hachemeister 
(1974)). 

For  the  means,  we  find  easily: 


E{j  |  6}  «  E{w  |  0}  -  m(9)  -  [m^e)  ;m2(0)  jm^e)  ] ’  , 
E{£}  -  E{v}  -  m  ■  [m1;m2;m11]'  . 


(5.2) 


(Mote  that  m^(0)  ■  m^(0)  .)  Computation  of  the  covariance  terms  is  straight 
forward,  but  tedious,  as  they  involve  all  11  moments  of  (1.4);  we  find,  for 
n  >  2  : 


(5.3) 


-  D  +  —  E(n)  ;  C{w;^}  -  D  ; 


where  D  and  E(n)  are  new  matrices,  analogous  to  the  matrices  in  (4.4), 
but  otherwise  unrelated.  Explicitly,  we  find: 


where 


Q11  -  “l 


“21  "  “2“l 


“ill  "  “ll“l 


(5.4) 


“22  "  “2 


“211  "  “2“ll 


(symmetric) 


“llll  ‘  “ll 


(5.5) 


E(n)  “  E  +  E  ; 

_  -®  n  -  1  -1 


“2  -  “ll 


“3  "  “21 


2  (“2i  -  mlu) 


(5.6)  E 


“4  "  “22 


2(“31  -  ^ll* 


(symmetric) 


1  1  *1  )  ! 


W 


Once  these  have  been  computed,  the  credibility  matrix  Z  Is  the  solution 
of: 

(5.8)  z|d  +  ~  E(n)j  -  p  ; 

and  the  vector  forecast  i(V)  *  [f^(P) ; f 2 (P) ;f^(P) J '  Is  given  by: 

(5.9)  f(V)  -  (1  -  Z)o  +  Zt(0)  , 
which  should  be  compared  with  (4.1*),  (4.2’). 


6.  INDEPENDENT  FORECASTS  USING  NATURAL  STATISTICS 


Before  examining  the  various  aspects  of  the  three-dimensional  fore¬ 
cast  (5.9),  it  is  of  Interest  to  consider  first  how  the  one-dimensional 
result  (2.1)  would  generalize  if  second-moment  forecasts  were  made  only  in 
terms  of  their  "natural"  statistics,  i.e.,  if  the  solution  to  Z  were 
forced  to  be  diagonal.  We  find: 


(6.1) 


m2(V) 


(1  -  z^)m2  +  z2t2^')  » 


z 


2 


n02 


n _ 

+  n 


02 


m4  ~  m22 
2  ’ 

°22  "  “2 


and,  for  n  >  2  , 


■u(S)  ■  E(  W„+2  I  Vi  *  'll"'  ■  (1  -  zU>"ll  +  zlltll!P>  ! 


(6.2) 


5  nn..(n) 


'11  n  +  non(n)  ’  “Oil' 


4(m2U-“im>  +7Ti  (”22-2"211  +  "llll) 


“1111  11 


These  are  to  be  compared  with  (2.1),  (2.2),  (2.3),  which,  of  course,  still 

hold  for  the  first-moment  forecast.  (Note  that  asterisks  distinguish  the 

independent  forecasts  f*,f*,  anc*  ^11  ^rom  t^,e  corresponding  components 

s  t 

of  the  joint  forecast  f  ,  and  that  z  ^  in  (6.2)  is  not  the  (1,1) 
component  of  Z  in  (4.2’).)  We  will  return  to  analysis  of  independent 
forecasts  in  Section  9,  after  analyzing  the  asymptotic  behaviors  of  (5.8), 
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7.  LIMITING  BEHAVIOR  OF  THE  JOINT  FORECAST 

The  analogy  with  (A. 5)  Is  complete  if  we  can  assume  that  D  has  an 
inverse  (but  see  Section  8),  for  then  (5.8)  can  be  rearranged  into: 

(7.1)  Z  -  n(nl  +  N(n))-1  ;  N(n)  -  E(n)D_1  ; 
so  that  we  now  have  a  time-varying  "time  constant": 

(7.2)  N(n)  -  N1  ;  -  E.D-1  ;  ^  «  E^"1  . 

Because  of  the  simple  form  of  E^  ,  it  follows  that  induces  correction 

terms  only  in  the  third  row  of  Z  ,  that  is ,  in  making  a  prediction  of 

m^(0)  ;  furthermore,  this  correction  term  vanishes  rapidly  with  increasing 
n  .  In  fact,  one  can  easily  make  the  asymptotic  expansion: 

(7.3)  Z  -  1  -  £  (l£  -  Nj)  +  0^  (n  -  -)  , 

_2 

so  that  the  correction  term  introduces  changes  only  of  order  n  or 

smaller. 

More  importantly,  we  see  that,  if  D  ^  exists,  then  Z  -*•  I  as  n  -*■  <*> 
thus  our  three-dimensional  forecasts  become  "fully  credible,"  that  is,  the 
forecasts  f^(V)  are  ultimately  given  essentially  by  their  own  natural 
statistics,  t^(P)  (i  ■  1,2,11)  .  Asymptotically,  then,  the  joint  predic¬ 
tions  of  Section  5  will  be  undistingulshable  from  the  independent  forms  of 


the  last  section. 


8.  REDUCED- RANK  D  MATRIX 


It  would  be  an  unusual  model  for  which  E  did  not  have  an  inverse: 

—00  7 

however,  it  is  theoretically  possible  that  D  does  not  exist.  In  several 
of  the  special  cases  examined  below,  D  is  of  rank  two  because  of  the  close 
asymptotic  relationship  between  t^CO)  and  t^(P)  •  Thus,  to  perform  the 
inversion  in  (5.10),  we  must  use  the  well-known  matrix  inversion  formula 
which  states  that,  if  a  and  b  are  n  *  k  matrices  of  rank  k  (k  <_  n)  , 
then : 


(8.1) 


[I  +  ab’]-1  -  I  -  a  [I.  +  b’aTV  . 
-n  —  -n  -  -k  -  - 


3  12 

If  D  is  of  rank  two  so  that,  for  example,  d  =  a^d  +  a32-  ’ 

where  d*  is  the  iC^  row  of  D  =  (i  *  1,2,3)  ,  then  D  can  be  written: 


(8.2) 


D 


1 

0 


0 

1 


a31  a32 


.n12 

AD  ,  say. 


We  find  from  (8.1)  that: 


(8.3) 


Z  *=  A^A(n)  +  ~  ij  D12E(n)-1  , 


where  A(n)  is  the  full-rank  2  x  2  matrix: 


(8.4) 


A(n)  -  D12E(n)_1A  . 


The  important  implication  of  these  results  is  that,  when  D  is  of  rank  two, 
the  limit  of  Z(n)  as  n  -*■  ®  is  not  1^  > 


but  is: 


(8.5) 


Z(»)  -  AA(»r1D12E^1  . 


Thus,  in  this  case,  the  t^(D)  are  never  "fully  credible"  for  the  f^(V) 
and  dependence  upon  the  prior  means,  m^  (i  ■  1,2,11)  ,  and  other  moments 
persists.  In  fact,  Z(°°)  is  not  even  diagonal! 

Nevertheless,  from  (8.3),  (8.4),  it  is  easy  to  show  that: 

(8.8)  (I  -  Z)D  -  ^  -  [l2  +  ^  }12  » 

so  that,  from  (9.1),  it  follows  that  the  mean-squared-errors  of  the  predic 
tions  will  vanish  even  in  this  case!! 


9.  MEAN-SQUARED  ERROR.  COMPARISON  OF  THREE-DIMENSIONAL  FORECASTS  WITH 
INDEPENDENT  FORECASTS  USING  NATURAL  STATISTICS 

For  completeness ,  we  record  that  the  form  (4.7)  still  gives  the  mean- 
squared-error  for  the  joint  first  and  second-moment  forecast,  provided  that 
the  definitions  (5.4)  through  (5.8)  are  used: 

(9.1)  l  -  E{ [f (P)  -  m(6)][f(P)  -  m(e)]'}  -  [I  -  Z]D  . 

The  diagonal  terms  of  this  matrix  then  measure  the  approximation  errors  of 
the  various  forecasts,  call  them  mse(f^(P))  (i  «  1,2,11)  .  Assuming  for 
the  moment  that  D  is  regular,  it  follows  from  (7.3)  and  (9.1)  that,  as 
n  -*■  °°  : 

(9.2)  Y  a  -  E  -  |E  D-1E  -  E,  1  + 

n  -“  2  I-®-  -®  -II 

n 

which  shows  how  quickly  the  mse's  vanish. 

There  are  several  arguments  in  favor  of  replacing  the  three-dimensional 
forecasts  (5.9)  with  their  independent  counterparts  (2.1),  (6.1)  and  (6.2): 

(a)  Computation  of  joint  forecast  requires  the  numerical  inversion 
of  a  3  x  3  matrix; 

(b)  Joint  forecasts  require  the  estimation  of  all  11  moments  (1.4), 
whereas  independent  forecasts  use  only  seven  moments 
Im1;m2,m11;mA,m22,m211,m1111]  . 

(c)  The  independent  credibility  forms  are  intuitively  more  appealing 


with  their  similarity  and  known  dependence  upon  n  ,  compared 
with  the  joint  forecast  with  its  complicated  dependence  upon  n  . 


On  the  other  hand,  we  expect  that  changing  to  independent  forecasts  will  in¬ 
crease  the  mean-squared-error  in  each  forecast,  in  general.  However,  in  the 
numerical  examples  of  Section  11,  we  find  that  this  numerical  difference  is 
usually  negligible,  at  least  in  absolute  value.  We  now  Investigate  general 
conditions  under  which  this  might  be  expected  to  hold  true,  at  least  for 
moderately  high  values  of  n  . 

Let  d^  ,  e^^  ,  and  e^^  (i  ■  1*2,3)  denote  the  diagonal  elements 
of  D  ,  Ew  ,  and  ,  respectively  (only  e^33  t  0).  From  (9.1),  which 

also  holds  in  the  one-dimensional  case,  and  (2.1),  we  find: 

(9.3)  mse(f *(£*))  -  e{[fJ(P)  -  m^e)}2}  -  (1  - 

~  —  e  - e211d1?"  +  oMr)  •  (n  -*•  ®) 

n  ®11  2  ®11  11  \  31 

n  \n  / 

Similarly: 

(9.4)  -.(fJCW)  .  (1  -  z2)d22  *  i  e.22  -  i  'l22i-\  *  oU)  , 

n  \n  / 

and  (n  -►  °°) 

(9.5)  mse(fJ1(P))  *  (1  -  z11>d33  *-  -  -j  +  oL~^\  . 

By  comparison  with  (9.2),  we  see  that  the  dominant  terms  and  the  term  due  to 
E^  are  identical,  so  that  the  positive  difference  between  the  mean-squared- 


errors  is  approximately: 
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where  e^  is  the  i**1  column-vector  of  Ew  .  This  difference  is  usually 

negligible,  compared  to  the  common  dominant  term,  n  ^e  ..  ,  for  moderately 

00 11 

large  values  of  n  . 

A  similar  analysis  can  be  carried  out  if  D  is  of  rank  two. 


1 
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10.  PREDICTIVE  VARIANCE  AND  FORECAST  ERROR 

There  are  two  second-order  central  moments  of  special  Interest: 
the  predictive  variance, 

(10.1)  v(P)  -  I  ^  “  E{f*n+i  '  ^(P)]2  I  p|  *  a2(P)  ~  m2(P) 

and  the  posterior-to-data  mean-squared-forecast-error: 

(10.2)  ♦(©)  *  e{lin+1  -  ^(P)]2  I  p}  -  v(p)  +  [fx(p)  -  ttl(p)]2  . 

if  the  f^(P)  and  f2(P)  obtained  previously  are  exact,  then  both  of 

the  expressions  are  identical  and  equal  to  f2(P)  -  f^(P)  •  If  credibility 
is  only  an  approximation,  then  this  latter  expression  may  still  be  a  good 
approximation  to  v(P)  |note  that  we  now  may  be  using  quadratic  functions 
of  the  data  in  f^(P))  .  Comparing  (P)  and  v(P)  requires  knowing  how 
closely  the  credibility  for  the  mean  approximates  the  Bayesian  predictive 
mean. 

We  can  proceed  a  bit  further  if  we  rewrite  the  mean-squared-forecast- 
error  as: 

(10.3)  $(P)  -  m2(t»  -  m11(P)  +  E^f^P)  -  m^B)]2  |  p}  , 

and  approximate  the  first  two  terms  by  f2(P)  -  f^(P)  .  The  third  term 

cannot  be  estimated  directly;  however,  by  averaging  once  more  over  all 
prior  values  of  P  ,  we  obtain  E^[f^(P)  -  mj(0)]2j  ■  mse  [ f ^ (P) ]  ,  which 
is  a  natural  by-product  of  our  analyses.  In  summary,  then,  we  would 
use  the  following  estimators  for  (10.1)  and  (10.2): 
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(10.4)  v(P)  a  f2(V )  -  f2(P)  ; 

(10.5)  *(P)  «  f2(P)  -  fu(P)  +  tnse  [^(P)]  . 

Biihlmann  (1970,  p.  100)  also  considers  the  problem  of  estimating 
the  predictive  variance.  He  breaks  v(P)  into  a  "variance  part"  and 
a  "fluctuation  part",  which,  in  our  notation,  are: 

(10.6)  v(P)  -  [m2(P)  -  mu(P)]  +  [mn(P)  -  m2(P)]  , 

the  posterior-to-data  version  of  c  ■  e  +  d  (cf.  (1.5)).  He  then 
approximates  the  first  part  by  a  one-dimensional  credibility  forecast 
using  the  unbiased  sample  variance,  £  =  (n  -  1)  (x^  -  xj  * 

t2(P)  -  t^P)  ,  i.e. , 

(10.7)  e(P)  »  m2(P)  -  m^P)  «  (1  -  zeXm2  ”  ®11)  +  ze  J2  • 

The  credibility  factor,  zg  ,  is  a  complicated  function  of  n  ,  but,  by 
making  the  simplifying  assumption  of  a  "normal  excess"  (e.g.,  the  kurtosis 
of  p(x  |  6)  is  that  of  the  normal  density  for  every  6),  he  obtains  a 
simplified  form,  zg  *  (n  -  K)/(n  -  3)  ,  where  K  is  a  complicated  ratio 
of  marginal  moments. 

The  second  factor, 

(10.8)  d(P)  -  mn(P)  -  m2(P)  -  E|[m1(P)  -  m^e)]2  |  pj 

is  approximated  by:  first,  replacing  m^(P)  by  f^(P)  ,  and  second, 

-n  ^  *  _ 

averaging  over  all  prior  values  of  V  ,  obtaining:  d(P)  a  mse  [f^(P)]  , 


» 


» 


giving  finally: 
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(10.9)  v (P)  ss  (1  -  *e)e  +  ze(t2(P)  -  tn(V)]  +  mse  [ f *  (P)  ]  . 

With  our  extended  use  of  these  statistics,  we  could  presumably  -  q 

Improve  Biihlmann's  analysis  by  arguing  in  the  same  way  that: 

(10.10)  v(D)  S  f2(0)  -  fn(0)  +  mse  [^(P)]  . 

» 

However,  this  is  exactly  the  approximation  (10.5)  for  ,  which 

must  be  larger  than  v(P)  if  mean  credibility  is  not  exact!  So,  we 

would  still  prefer  (10.4)  for  the  estimate  of  the  variance.  • 

The  dif ficult-to-estimate  term,  d(P)  ,  is,  in  fact,  the  posterior- 

to-data  predictive  covariance,  E{[x  -  m, (P)][x  , .  -  m, (P)]  I  P}  , 

n+i  l  n+z  X 

which  we  know  must  vanish  with  n  as  the  true  value  of  9  is  identified.  • 

For  instance,  with  the  simple-exponential  family  of  Section  6,  we  have 

d(P)  *  e(P)d/(e  +  nd)  or  v(P)  <_  e(P)[l  +  (d/(e  +  nd))]  .  And,  in  the 

general  case,  if  f^(V)  is  close  to  m^(P)  ,  then  we  know  that  the  • 

average  (preposterior)  value  of  d(P)  is  mse  I f ^ (P) ]  ,  which  probably 

vanishes  like  mse  [f^(P)3  ■  ed/(e  4-  nd)  . 

So,  in  short,  we  doubt  if  the  accuracy  issues  raised  here  are  im-  9 

portant  in  any  realistic  application,  and  expect  the  errors  in  using 
(10.4),  (10.5)  to  be  of  the  same  order  of  magnitude  as  the  errors  in  the 
underlying  predictions  f (P)  .  * 

» 
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11.  NUMERICAL  EXAMPLES 


It  should  be  remenbered  that  important  simplifications  often  occur  in 
D  and  E(n)  for  the  usual  analytic  forms  assumed  for  the  likelihood  and 
the  prior.  For  Instance,  where  the  likelihood  is  normal,  with  possibly 
random  mean  and  variance,  we  have: 

m^Ce)  -  3v(6)m(e)  +  m^(0)  ; 
m4(e)  *  3v2(0)  +  6v(0)m2(0)  +  m^(0)  ; 


where 


m(0)  *  e^(0)  and  v(8)  «  ni,(0)  -  m^(0)  . 

From  this,  we  see  that  all  eleven  moments  in  (1.4)  can  be  expressed  in  terms 
of  moments  and  cross-moments  of  m(0)  (up  to  order  4)  and  v(0)  (up  to 
order  2) . 

The  likelihoods  introduced  in  Examples  1  through  6  of  Section  3  have 
been  characterized  by  Morris  (1982)  as  the  natural  exponential  families  with 
quadratic  variance  functions,  i.e.,  the  variance  is  at  most  a  quadratic 
function  of  the  mean.  From  this,  it  follows  that,  for  this  family,  the  com¬ 
ponents  of  m(0)  in  (5.2)  are  linear ly-dependent  functions  of  the  parameter, 

and  that  D  is  singular.  For  example,  if  the  likelihood  i6  Poisson  (x)  , 

2  2 

then  m(x)  -  lx  ;  x  +  x  ;  x  . 

Ue  now  consider  three  numerical  examples  that  illustrate  these  ideas;  in 
all  examples,  the  joint  credibility  forecasts  are  exactly  the  Bayesian  mean 
forecast,  for  all  n  .  (However,  we  have  not  introduced  this  prior  knowledge 
into  the  numerical  calculations  below!) 


Consider  Example  7  from  Section  3,  the  Normal^oT*)  ,  with  Normal- 
Gamma  prior,  and  with  the  following  hyperparameters:  Xq^  -  10  ;  n^  -  10  ; 
Xq2  ■  21  ;  a  ■  6.5  .  Note  that  we  have  chosen  a  »  (n^  +  3)/2  so  that  the 
predictive  second  moment  will  be  in  credibility  form  (3.14),  (3.15). 
Numerically,  we  find  the  eleven  marginal  moments  to  be: 

M  -  {1,2.1,1.1,4.3,2.3,1.3,12.037,5.0033,5.1033,2.7589,1.6367,}  , 

and  the  variance  components  are: 

d  *  E^(nQw)  *  0.1  ;  e  *  E{u>  *}  *  1.0  . 

The  covariance  matrices  of  Section  5  are: 


/.  10000 

.20000 

. 20000^ 

k  / 1.00000 

2.00000 

2.00000 

I  .20000 

.69333 

.44889 

]  ;  E,  -  (  2.00000 

6.93333 

4.48889 

\.  20000 

.44889 

.42667) 

'  \ 2. 00000 

4.48889 

4.48889 

%?33  * 

2.44444 

.  The 

independent  time  constants  of 

Section 

noi  * 

D02  *  10 

,  and 

nou(n)  -  10.52  +  (5 

.73/ (n  - 

1»  • 

For  n  *  2,10,100  ,  and  10,000,  Figure  1  shows  the  credibility  matrix 
Z  for  the  three-dimensional  forecasts  of  (4.2)’,  together  with  their  corre¬ 
sponding  mean-squared  errors,  the  diagonal  terms  from  (9.2).  Also  shown  are 
the  corresponding  Independent  forecast  factors  of  (2.2),  (6.1),  (6.2),  ar¬ 
ranged  in  matrix  format  for  easy  visual  comparison  (and  thus  making  (5.9)  a 
general  forecast  formula,  even  with  Z  diagonal);  the  corresponding  mse's 
are  also  given,  and  can  be  gotten  also  from  the  diagonal  terms  of  (9.2). 
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We  remark  that: 

(1)  Because  of  previous  results,  m^(P)  ■  f^(D)  *  f*(P)  and  * 

f2<P)  ■  f^CP)  *  since  a  ■  n^  +  3  .  Thus,  the  upper  part  of  Z 
is  diagonal,  with  ■  Z22  equal  to  the  independent  prediction 

factors.  We  also  know  that  m^(P)  ■  f^(D)  (but  not  equal  to 
f*l(0)  ,  in  general);  here  it  is  of  interest  to  see  how  long  a 
heavier  weight  is  attached  to  t^(P)  instead  of  the  natural  sta¬ 
tistic,  t^(P)  . 

(2)  The  mse's  for  the  first  two  components  are,  of  course,  the  same 
for  both  predictions.  As  might  be  expected,  predicting  second 
moments  gives  larger  mse's  than  the  mse  for  f^(P)  ;  however, 
the  relative  rate  of  decrease  with  n  is  about  the  same.  Further¬ 
more,  there  is  only  about  a  6%  increase  in  mse  for  using  f*^(P) 
over  the  exact  f^(D)  • 

(3)  Both  credibility  factors  approach  the  identity  matrix  as  n  ap¬ 
proaches  infinity,  as  the  statistics  in  t(V)  become  "fully 
credible". 


Example  B: 

Consider  Example  4  from  Section  3,  the  Exponential(6)  ,  with  Gamma 
prior,  with  hyperparameters:  Xq^  ■  10  ;  n^^  ■  10  .  The  marginal  moments 
are: 

M  -  {1,2.2222,1.1111,8.3333,2.7778,1.3889,47.619,11.905,7.9365,3.9683,1.9841}  . 

The  covariances  matrices  are: 
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'.mu 

.55556 

.27778\ 

/1.11111 

5.55556  2.77778 

.55556 

2.99824 

1.49912 

1  ;  e„  - 

5.55556 

39.68254  15.87302 

.27776 

1.49912 

.74956/ 

\2. 77778 

15.87302  7.93651 

(-1^33 

-  3.9685 

.  The  hyperparameters  were  chosen  to  make  m^  -  . 

and  n^  *  10.0  ,  as  in  Example  A,  but  now,  due  to  the  change  in  distribu¬ 
tions,  we  have  n 


02 


13.24  ,  and  nQ11(n)  -  10.59  +  (5.29/(n  -  1))  . 


Figure  2  shows  again  the  results  for  n  -  2,10,100  ,  and  10,000,  in  a 
format  similar  to  that  of  Figure  1. 

Notice  the  following: 

(1)  As  in  Example  A,  rn^(P)  ■  f^(P)  ■  f*(P)  ;  however,  now  both  £2 (I?) 
and  f^(P)  use  all  three  statistics,  particularly  t^(P)  and 
t^(P)  .  Now,  as  n  -*■  <*>  ,  we  find  the  surprising  result  that 
2t^(P)  is  the  preferred  predictor  for  s?(P)  ,  rather  than  the 
"natural"  estimator,  t 2<P)  ;  they  both  have  the  same  expectation, 
but  the  former  has  smaller  variance. 

(2)  In  fact,  we  can  make  the  following  stronger  statements.  As  a 


consequence  of  the  exponential  assumption  only,  m2(0)  ■  2m^(0) 
for  all  0  ,  so  that  m^(P)  ■  2m^(P)  for  any  prior.  Assumption 
of  a  Gamma  prior  makes  both  predictions  linear  functions  of  t(6)  , 


and,  in  fact, we  see  from  Figure  2  that  “  2z^  (j  *  1,2,3)  , 

so  that  f2(P)  ■  2f^(P)  for  all  V  ! 

(3)  The  mse's  for  independent  predictions  of  the  two  second  moments 
are,  of  course,  larger  than  in  the  joint  predictions,  and  worst 
for  f2(P)  ,  as  it  is  forced  into  using  t^(P)  ,  rather  than 
t^(P)  as  its  sole  predictor.  This  gives  a  relative  degredation 
which  climbs  about  20%,  but,  at  the  same  time,  all  mse’s  are 


FIGURE  2.  NUMERICAL  RESULTS  FOR  EXAMPLE  B,  GAMMA-EXPONENTIAL 


decreasing  with  n  at  about  the  same  relative  rate.  Substituting 


t^(P)  for  the  "natural"  predictor  of  f^(V)  would,  of  course, 
reduce  the  mse  to  four  times  that  of  f*^(P)  »  which  at  its 
worst  value  (n  «  2)  ,  is  only  about  5%  larger  than  the  joint 
prediction. 

(4)  The  non-convergence  of  2  to  the  identity  matrix  is  the  conse¬ 
quence  of  the  previously-discussed  fact  that  D  is  singular. 
However,  since  m2  ■  28^  ^  ,  at^(P)  is  ultimately  "fully  credible" 
as  n  -*■  ■*>  ,  i.e.,  no  dependence  upon  prior  moments  remains  in 
f 2 (P)  in  the  limit.  We  have  already  proven  this  directly  in  (8.6). 


Example  C: 

Consider  Example  3  from  Section  3,  the  Poisson(ir)  ,  with  Gamma  prior, 
and  hyperparameters:  «  10  ;  nQ1  =  10  .  The  marginal  moments  are: 


M  -  {1,2.1,1.1,5.62,2.42,1.32,18.336,6.776,5.456,3.036,1.716} 


The  covariance  matrices  are: 


.32000 

.22000\ 
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3.20000 

2.20000 
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|  ;  «  I  3.20000 

12.88000 
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\  2 . 20000 
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with  (§^33  "2.2  .  The  hyperparameters  were  again  chosen  to  make  *  1.0 
and  nQ^  *  10.0  ,  but  now  Hq2  ”  12.31  ,  and  ng^(n)  *  10.43  +  (4.35/(n-l)) 
Figure  3  tabulates  the  results  for  n  -  2,10,100  ,  and  10,000  in  the 
same  format  as  previous  examples. 


FIGURE  3.  NUMERICAL  RESULTS  FOR  EXAMPLE  C,  GAMMA-POISSON 
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We  notice  that: 

(1)  As  in  Example  A  and  B,  the  first  moment  uses  only  t^(P)  ,  but  the 
second  moments  use  all  three  statistics,  with  t£(P)  playing  a 
decreasingly  important  role.  In  contrast  to  Example  B,  however, 
we  now  find  that,  as  n  -*•  00  ,  t^(P)  +  t^CP)  is  PTe^eTTe<^ 

’■edictor  for  n^CP)  ,  rather  than  t2(P)  . 

(2)  This  is  a  consequence  of  the  assumption  that  the  likelihood  is 

2 

Poisson,  for  then  m^CQ)  =  ^(6)  +  ^(0)  for  all  9  ,  so  that 
n^CP)  *=  m^P)  +  m^1(P)  for  any  prior.  It  is  the  assumption  of 
the  Gamma  prior  that  makes  predictions  using  only  linear  functions 
of  t(P)  exact,  and  in  Figure  3  we  can  see  that,  in  fact,  = 

zlj  +  z3j  (j  "  1>2’3)  »  so  that  f2(V)  *  fl(P)  +  fll(P)  for  a11 
P  ! 

(3)  The  mse's  follow  the  pattern  of  Example  B,  with  the  mse  of  f*(P) 
becoming  progressively  relatively  worse  than  its  joint  counterpart. 
Here,  however,  to  improve  the  prediction  error,  one  would  probably 
have  to  include  both  t^(P)  and  t^(P)  ,  as  it  is  not  clear  that 
just  one  of  the  latter  would  be  an  improvement  over  using  just 
t2(P)  •  Furthermore,  neither  of  the  other  statistics  would  ever 
become  "fully  credible"  as  n  -*■  ®  ,  as  they  are  not  individually 
equal  in  expectation  to  ,  only  in  sum.  Clearly,  the  best 
single  statistic  to  use  for  osu  (P)  in  the  Poisson  case  is 

C1(P)  +  tll(P)  ' 


12.  COMPUTATIONAL  STRATEGIES;  CONCLUSION 


The  last  two  examples  show  that  some  care  must  be  exercised  if  one 
wishes  to  make  Independent  forecasts  where  p(x  J  6)  is  assumed  to  be  in 
the  QVF-NEF  family,  remembering  that  this  also  includes  (fixed  numbers  of) 
convolutions  of  Examples  1-6,  such  as  the  Negative  Binomial  with  fixed 
shape  parameter.  One  can,  of  course,  use  the  combination  of  "natural" 
statistics  appropriate  to  the  assumed  likelihood.  This  is  particularly 
important  when  we  also  assume  that  the  natural  conjugate  prior  is  appro¬ 
priate. 

On  the  other  hand,  for  an  arbitrary  prior,  the  moments  will  not  be 
linear  functions  of  the  statistics,  so  that  all  positions  of  Z  would 
be  non-zero  anyway,  as  would  also  be  the  case  if  all  moments  were  from 
empirical  studies.  In  these  cases,  Z  would  approach  the  identity  matrix 
as  n  -*■  ®  ,  and  we  expect  that  the  independent  forecasts  (2.2),  (6.1),  (6.2) 
would  be  equally  good  (or  equally  bad)  as  the  joint  forecasts.  Clearly, 
more  computational  experience  is  needed  in  making  this  decision. 

The  great  advantage  of  the  joint  forecast  is  that  it  can  always  be 

used  if  n  ^  2  ,  and,  if  there  is  a  tendency  for  certain  combinations  of 

statistics  to  dominate,  it  will  be  revealed  automatically.  Of  course, 

2 

if  n  ■  1  ,  we  are  forced  to  use  only  t^(P)  ■  x^  and  tj(P)  ■  x^  ;  the 
predictive  power  will  be  weak  anyway,  in  most  practical  cases. 

In  summary,  we  have  presented  an  easily  Implemented  three-dimensional 
credibility  formula  that  simultaneously  approximates  the  first  and  second 
moments  of  the  Bayesian  predictive  density.  While  this  approach  requires 
eleven  prior  moments  from  the  collective,  this  calculation  is  simplified 
when  familiar  analytic  forms  are  assumed  for  the  likelihood.  Previous 
work  has  shown  that  the  credibility  mean  is  exact  in  t^(P)  for  a  wide 


class  of  likelihoods  and  priors  in  which  the  sample  mean  is  the  sufficient 
statistic;  here  we  have  shown  that  the  second-moment  credibility  predic¬ 
tions  are  also  exact  for  five  widely-used  likelihoods  and  their  natural 
conjugate  priors,  when  using  the  three  "natural"  statistics  in  t(P)  . 

For  these  and  other  reasons,  we  believe  that  these  linear  prediction 
formulae  will  turn  out  to  be  robust  in  other  cases  where  the  distributions 
are  empirical,  or  where  the  exact  predictions  are  known  to  be  non-linear 
in  the  data.  We  suspect  also  that,  in  most  cases,  it  will  also  be 
reasonable  to  use  the  simplified,  independent  forecasts,  paying  due 
attention  to  the  remarks  above.  The  authors  look  forward  to  hearing 
from  those  who  apply  this  approach  to  actual  prediction  problems. 
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